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NEARLY ROGT-AT APPROXIMATION FOR REGRESSION 
QUANTILE PROCESSES 

By Stephen Portnoy^ 

University of Illinois at Urbana- Champaign 

Traditionally, assessing the accuracy of inference based on regres- 
sion quantiles has relied on the Bahadur representation. This provides 
an error of order n-i/* in normal approximations, and suggests that 
inference based on regression quantiles may not be as reliable as that 
based on other (smoother) approaches, whose errors are generally of 
order n~^^'^ (or better in special symmetric cases). Fortunately, ex- 
tensive simulations and empirical applications show that inference for 
regression quantiles shares the smaller error rates of other procedures. 
In fact, the "Hungarian" construction of Komlos, Major and Tusnady 
[Z. Wahrsch. Verw. Gebiete 32 (1975) 111-131, Z. Wahrsch. Verw. 
Gebiete 34 (1976) 33-58] provides an alternative expansion for the 
one-sample quantile process with nearly the root-n error rate (specifi- 
cally, to within a factor of logn). Such an expansion is developed here 
to provide a theoretical foundation for more accurate approximations 
for inference in regression quantile models. One specific application 
of independent interest is a result establishing that for conditional 
inference, the error rate for coverage probabilities using the Hall and 
Sheather [J. R. Stat. Soc. Ser. B Stat. Methodol. 50 (1988) 381-391] 
method of sparsity estimation matches their one-sample rate. 

1. Introduction. Consider the classical regression quantile model: given 
independent observations {{xiYi):i = 1, . . . , n}, with Xj G BP fixed (for fixed p), 
the conditional quantile of the response Yi given Xi is 

QY,{j\xi) = ^'M'T). 

Let /3(r) be the Koenker-Bassett regression quantile estimator of /3(t). 
Koenker (2005) provides definitions and basic properties, and describes the 
traditional approach to asymptotics for /3(r) using a Bahadur representa- 



Received August 2011; revised May 2012. 
^Supported in part by NSF Grant DMS- 10-07396. 

AMS 2000 subject classifications. Primary 62E20, 62J99; secondary 60F17. 
Key words and phrases. Regression quantiles, asymptotic approximation, Hungarian 
construction. 

This is an electronic reprint of the original article published by the 
Institute of Mathematical Statistics in The Annals of Statistics^ 
2012, Vol. 40, No. 3, 1714-1736. This reprint differs from the original in 
pagination and typographic detail. 



1 



2 



S. PORTNOY 



tion: 

Bn{T) = ni/2(^(T) - /3(r)) = D{x)W{t) + Rn, 

where W{t) is a Brownian Bridge and Rn is an error term. 

Unfortunately, i?„ is of order n~^/^ [see, e.g., Jureckova and Sen (1996) 
and Knight (2002)]. This might suggest that asymptotic results are ac- 
curate only to this order. However, both simulations in regression cases 
and one-dimensional results [Komlos, Major and Tusnady (1975, 1976)] jus- 
tify a belief that regression quantile methods should share (nearly) the 
0(n~^/^) accuracy of smooth statistical procedures (uniformly in r). In 
fact, as shown in Knight (2002), n^/^i?„ has a limit with zero mean and 
that is independent of W{t). Thus, in any smooth inferential procedure 
(say, confidence interval lengths or coverages), this error term should enter 
only through ER^ = 0{n~^/'^). Nonetheless, this expansion would still leave 
an error of o(n~^/^) (coming from the error beyond the Rn term in the 
Bahadur representation), and so would still fail to reflect root-n behavior. 
Furthermore, previous results only provide such a second-order expansion 
for fixed r. 

It must be noted that the slower 0{n~^/^) error rate arises from the 
discreteness introduced by indicator functions appearing in the gradient 
conditions. In fact, expansions can be carried out when the design is as- 
sumed to be random; see De Angelis, Hall and Young (1993) and Horowitz 
(1998), where the focus is on analysis of the {x^Y) bootstrap. Specifically, 
the assumption of a smooth distribution for the design vectors together with 
a separate treatment of the lattice contribution of the intercept does permit 
appropriate expansions. Unfortunately, the randomness in X means that all 
inference must be in terms of the average asymptotic distribution (averaged 
over X), and so fails to apply to the generally more desirable conditional 
forms of inference. Specifically, unconditional methods may be quite poor 
in the heteroscedastic and nonsymmetric cases for which regression quantile 
analysis is especially appropriate. The main goal of this paper is to reclaim 
increased accuracy for conditional inference beyond that provided by the 
traditional Bahadur representation. 

Specifically, the aim is to provide a theoretical justification for an error 
bound of nearly root-n order uniformly in r. Define 

5n(r) = V^(/3(T)-/3(r)). 

We first develop a normal approximation for the density of 6 with the 
following form: 

/^((^) = ^s(<5)(l + 0(L„n-i/2)) 

for \\5\\ < Dy/logn, where L„ = (logn)^/^. We then extend this result to the 
densities of a pair of regression quantiles in order to obtain a "Hungarian" 
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construction [Komlos, Major and Tusnady (1975, 1976)] that approximates 
the process Bnir) by a Gaussian process to order 0{L^n~^^'^), where L* = 
(logn)^/^ (uniformly for £ <t < 1 — e). 

Section 2 provides some applications of the results here to conditional 
inference methods in regression quantile models. Specifically, an expansion 
is developed for coverage probabilities of confidence intervals based on the 
[Hall and Sheather (1988)] difference quotient estimator of the sparsity func- 
tion. The coverage error rate is shown to achieve the rate C'(n~^/'^logn) for 
conditional inference, which is nearly the known "optimal" rate obtained for 
a single sample and for unconditional inference. Section 3 lists the condi- 
tions and main results, and offers some remarks. Section 4 provides a de- 
scription of the basic ingredients of the proof (since this proof is rather long 
and complicated). Section 5 proves the density approximation for a fixed r 
(with multiplicative error). Section 6 extends the result to pairs of regression 
quantiles (Theorem 1), and Section 7 provides the "Hungarian" construc- 
tion (Theorem 2) with what appears to be a somewhat innovative induction 
along dyadic rationals. 

2. Implications for applications. As the impetus for this work was the 
need to provide some theoretical foundation for empirical results on the 
accuracy of regression quantile inference, some remarks on implications are 
in order. 

Remark 1. Clearly, whenever published work assesses the accuracy of 
an inferential method using the error term from the Bahadur representation, 
the present results will immediately provide an improvement from 0{n~~^^^) 
to the nearly root-n rate here. One area of such results is methods based 
directly on regression quantiles and not requiring estimation of the spar- 
sity function [1//(F-1(t))]. There are several papers giving such results, 
although at present it appears that their methods have theoretical justifica- 
tion only under location-scale forms of quantile regression models. 

Specifically, Zhou and Portnoy (1996) introduced confidence intervals (es- 
pecially for fitted values) based on using pairs of regression quantiles in a way 
analogous to confidence intervals for one-sample quantiles. They showed that 
the method was consistent, but the accuracy depended on the Bahadur error 
term. Thus, results here now provide accuracy to the nearly root-n rate of 
Theorem 2. 

A second approach directly using the dual quantile process is based on 
the regression ranks of Gutenbrunner et al. (1993). Again, the error terms in 
the theoretical results there can be improved using Theorem 1 here, though 
the development is not so direct. 

For a third application, Neocleous and Portnoy (2008) showed that the 
regression quantile process interpolated along a grid of mesh strictly larger 
than is asymptotically equivalent to the full regression quantile process 



4 



S. PORTNOY 



to first order, but (because of additional smoothness) will yield monotonic 
quantile functions with probability tending to 1. However, their development 
used the Bahadur representation, which indicated that a mesh of order n~^^^ 
balanced the bias and accuracy and bounded the difference between /3(t) 
and its linear interpolate by nearly 0{n~^^^). With some work, use of the 
results here would permit a mesh slightly larger than the nearly root-n rate 
here to obtain an approximation of nearly root-n order. 

Remark 2 . Inference under completely general regression quantile mod- 
els appears to require either estimation of the sparsity function or use of 
resampling methods. The most general methods in the quantreg package 
[Koenker (2012)] use the "difference quotient" method with the [Hall and 
Sheather (1988)] bandwidth of order n~^^^, which is known to be optimal 
for coverage probabilities in the one-sample problem. As noted above, ex- 
pansions using the randomness of the regressors can be developed to provide 
analogous results for unconditional inference. The results here (with some 
elaboration) can be used to show that the Hall-Sheather estimates provide 
(nearly) the same rates of accuracy for coverage probabilities under the con- 
ditional form of the regression quantile model. 

To be specific, consider the problem of confidence interval estimation for 
a fixed linear combination of regression parameters: a' /3{t). The asymptotic 
variance is the well-known sandwich formula 

(2.1) sl{6) = T{l-T)aiX'DXy\x'X){X'DXy\, D = diag{x'i6) , 

where 5 is the sparsity, 6 = /3'(t) (with /3' being the gradient), and where X 
is the design matrix. 

Following Hall and Sheather (1988), the sparsity may be approximated 
by the difference quotient 5 = (/3(r + /i) — /3(t — h))/{2h). Standard approx- 
imation theory (using the Taylor series) shows that 

5=~5 + 0{h^). 

The sparsity may be estimated by 

(2.2) ,5 = A(/i)/(2/i) = (/3(r + h)- /3(r - h))/{2h), 

and the sparsity (2.1) may be estimated by inserting 5 in D. 
Then, as shown in the Appendix, the confidence interval 

(2.3) a'/3(r)Ga'/3(r)±z«Sa(<5) 

has coverage probability 1 — 2a + ©((logn)^"^/^), which is within a factor 
of logn of the optimal Hall-Sheather rate in a single sample. Furthermore, 
this rate is achieved at the (optimal) /i-value /i* = c-y/log nn~^/^, which is 
the optimal Hall-Sheather bandwidth except for the \J\ogn term. 
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Since the optimal bandwidth depends on the optimal constant for 
the /i* cannot be determined, as it can when X is allowed to be random [and 
for which the 0{l/{nhn)) term is explicit]. This appears to be an inherent 
shortcoming for using inference conditional on the design. 

Note also that it is possible to obtain better error rates for the coverage 
probability by using higher order differences. Specifically, using the notation 
of (2.2), 

lAih)-lA{2h)=f3'{T) + 0{h^). 

As a consequence, the optimal bandwidth for this estimator is of order n~^/^, 
and the coverage probability is accurate to order n~^/^ (except for logarith- 
mic factors). 

Remark 3. A third approach to inference applies resampling methods. 
As noted in the Introduction, while the {x,Y) bootstrap is available for un- 
conditional inference, the practicing statistician will generally prefer to use 
inference conditional on the design. There are some resampling approaches 
that can obtain such inference. One method is that of Parzen, Wei and Ying 
(1994), which simulates the binomial variables appearing in the gradient 
condition. Another is the "Markov Chain Marginal Bootstrap" of He and 
Hu (2002) [see also Kocherginsky, He and Mu (2005)]. However, this method 
also involves sampling from the gradient condition. The discreteness in the 
gradient condition would seem to require the error term from the Bahadur 
representation, and thus leads to poorer inferential approximation: the error 
would be no better than order n~^/^ even if it were the square of the Bahadur 
error term. While some evidence for decent performance of these methods 
comes from (rather limited) simulations, it is often noticed that these meth- 
ods perform perhaps somewhat more poorly than the other methods in the 
quantreg package of Koenker (2012). Clearly, a more complete analysis of 
inference for regression quantiles based on the more accurate stochastic ex- 
pansions here would be useful. 

3. Conditions, fundamental theorems and remarks. Under the regres- 
sion quantile model of Section 1, the following conditions will be imposed: 

Let Xi denote the coordinates of Xi except for the intercept (i.e., the last 
p — 1 coordinates, if there is an intercept). Let </'i(t) denote the conditional 
characteristic function of the random variable Xi{I(Yi < x^P{t) + 5 / y/n) — t) , 
given Xi. Let fi{y) and Fi{y) denote the conditional density and c.d.f. of Yi 
given Xi. 

Condition XI. For any e > 0, there is r/ e (0, 1) such that 
(3.1) inf n ,/.,(*)< < 

ll*ll>e 

uniformly ine<T<l — e. 
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Condition X2. ||xj|| are uniformly bounded, and there are positive 
definite px p matrices G = G{t) and H such that for any e > (as n — )■ oo) 

n 

(3.2) G„(r) = - ^ h{x'MT))x',x, = G(t)(1 + 0{n~^'^)), 

1 " 

(3.3) F„=-Vx:x, = i/(1 + 0(7^-1/2)) 

n ^-^ 

uniformly ine<T<l — e. 

Condition F. The derivative of log(fi{y)) is uniformly bounded on the 
interval {y- £ < Fi{y) < 1 — e} . 

Two fundamental results will be developed here. The first result provides 
a density approximation with multiplicative error of nearly root-n rate. A re- 
sult for fixed r is given in Theorem 5, but the result needed here is a bivariate 
approximation for the joint density of one regression quantile and the differ- 
ence between this one and a second regression quantile (properly normalized 
for the difference in r-values). 

Let e < Ti < 1 — e for some e > 0, and let T2 = ri + a„ with a„ > cn~^ for 
some b<l. Here, one may want to take b near 1 [see remark (1) below], 
though the basic result will often be useful for b = ^, or even smaller. Define 

(3.4) Bn = Bnin) = r?'\p{Ti) - /3(n)), 

(3.5) Rn = Rn{Tl,T2) = {nanfl\^P{Ti) - /?(ri)) - (/3(r2) - P{t2))]. 

Theorem 1. Under Conditions XI, X2 and F, there is a constant D 
such that for \Bn\ < ^(logn)^/^ and \Rn\ < I?(logn)^/2, the joint density 
of Rn and Bn at 6 and s, respectively, satisfies 

fR^MS,s) = iprAS,s){l + 0{{nan{log7if)-'/^)), 

where (pr,^ is a normal density with covariance matrix r„, having the form 
given in (7.3). 

The second result provides the desired "Hungarian" construction: 

Theorem 2. Assume Conditions XI, X2 and F. Fix an = with 
b < 1, and let {tj} be dyadic rationals with denominator less than . Define 
Bnir) to be the piecewise linear interpolant of {Bn{Tj)} [as defined in (3.4)]. 
Then for any e > 0, there is a (zero-mean) Gaussian process, {Zn{Tj)}, de- 
fined along the dyadic rationals {tj} and with the same covariance structure 
as Bn{T) (along {tj}) such that its piecewise linear interpolant Z*{t) satis- 
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fies 



£<T<l-£ 



sup \b:{t)-z:{t)\=o 



/ (logn)5/2 \ 
V J 



almost surely. 

Some remarks on the conditions and ramifications are in order: 

(1) The usual construction approximates Bn{T) by a "Brownian Bridge" 
process. Theorem 2 really only provides an approximation for the discrete 
processes at a sufficiently sparse grid of dyadic rationals. That the piecewise 
linear interpolants converge to the usual Brownian Bridge follows as in Neo- 
cleous and Portnoy (2008). The critical impediment to getting a Brownian 
Bridge approximation to -B„(r) with the error in Theorem 2 is the square 
root behavior of the modulus of continuity. This prevents approximating 
the piecewise linear interpolant within an interval of length greater than 
(roughly) order 1/n if a root-n error is desired. In order to approximate the 
density of the difference in Bn{T) over an interval between dyadic rationals, 
the length of the interval must be at least of order n~'' (for 6 < 1). Clearly, it 
will be possible to approximate the piecewise linear interpolant by a Brown- 
ian Bridge with error V = n^**/^, and thus to get arbitrarily close to the 
value of ^ for the exponent of n. For most purposes, it might be better to 
state the final result as 



for any a < 1/2 (where Z is the appropriate Brownian Bridge); but the 
stronger error bound of Theorem 2 does provide a much closer analog of the 
result for the one-sample (one-dimensional) quantile process. 

(2) The one-sample result requires only the first power of logn, which is 
known to give the best rate for a general result. The extra addition of 3/2 in 
the exponent is clearly needed for the density approximation, but this may 
be only a technical assumption. Nonetheless, I conjecture that some extra 
amount is needed in the exponent. 

(3) Conditions XI and X2 can be shown to hold with probability tending 
to one under smoothness and boundedness assumptions of the distribution 
of X. Nonetheless, the condition that be bounded seems rather strong 
in the case of random x. It seems clear that this can be weakened, though 
probably at the cost of getting a poorer approximation. For example, ||x|| 
having exponentially small tails might increase the bound in Theorem 2 by 
an additional factor of logn, and algebraic tails are likely worse. However, 
details of such results remain to be developed. 

(4) Similarly, it should be possible to let e, which defines the compact 
subinterval of r- values, tend to zero. Clearly, letting e„ be of order 1/n 



sup ||S„(T)-Z(r)||=0(n-'^) 



£<T<l-e 
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would lead to extreme value theory and very different approximations. For 
slower rates of convergence of , Bahadur expansions have been developed 
[e.g., see Gutenbrunner et al. (1993)] and extension to the approximation 
result in Theorem 2 should be possible. Again, however, this would most 
likely be at the cost of a larger error term. 

(5) The assumption that the conditional density of the response (given x) 
be continuous is required even for the usual first order asymptotics. However, 
one might hope to avoid Condition F, which requires a bounded derivative 
at all points. For example, the double exponential distribution does not 
satisfy this condition. It is likely that the proofs here can be extended to 
the case where the derivative does not exist on a finite set (or even on 
a set of measure zero), but dropping differentiability entirely would require 
a rather different approach. Furthermore, the apparent need for bounded 
derivatives in providing uniformity over r in Bahadur expansions suggests 
the possibility that some differentiability is required. 

(6) Theorem 1 provides a bivariate normal density approximation with 
error rate (nearly) n~^/^ when ri and T2 are fixed. When = T2 — ri — )• 0, 
of course, the error rate is larger. Note, however, that the slower conver- 
gence rate when — ?• does not reduce the order of the error in the final 
construction since the difference Dn = P{t2) — /3(t'i) is of order (na„)~-^/^. 

4. Ingredients and outline of proof. The development of the fundamen- 
tal results (Theorems 1 and 2) will be presented in three phases. The first 
phase provides the density approximation for a fixed r, since some of the 
more complicated features are more transparent in this case. The second 
phase extends this result to the bivariate approximation of Theorem 1 . The 
final phase provides the "Hungarian" construction of Theorem 2. To clarify 
the development, the basic ingredients and some preliminary results will be 
presented first. 

Ingredient 1. Begin with the finite sample density for a regression 
quantile [Koenker (2005), Koenker and Bassett (1978)]: assume Yi has a den- 
sity, /i(y), and let r be fixed. Note that /3(r) is defined by having p zero 
residuals (if the design is in general position). Specifically, there is a sub- 
set, /i, of p integers such that I3{t) = Xj^^Yh, where Xh has rows x'^ for i G /i 
and Yh has coordinates Yi for i £ h. Let H denote the set of all such p-element 
subsets. Define 

5 = V^(/3(T)-/?(r)). 

As described in Koenker (2005), the density of 5 evaluated at the argu- 
ment 5 = ^yn(b — /3(t)) is given by 

(4.1) f^i6) = ^ det{Xh)P{Sn G A} J] M^l^i^) + n^^'^^)- 
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Here, the event in the probabihty above is the event that the gradient 
condition holds for a fixed subset, h:Sn& Ah, where Ah = XhR, with R 
the rectangle that is the product of intervals (r — l,r) [see Theorem 2.1 of 
Koenker (2005)], and where 

(4.2) Sn = Sn{h, (3,6) = Y, X^{m < + U-^'H) - t). 

Ingredient 2. Since n~^/'^Sn is approximately normal, and Ah is bound- 
ed, the probability in (4.1) is approximately a normal density evaluated at 6. 
To get a multiplicative bound, we may apply a "Cramer" expansion (or 
a saddlepoint approximation). If Sn had a smooth distribution (i.e., satisfied 
Cramer's condition), then standard results would apply. Unfortunately, Sn 
is discrete. The first coordinate of Sn is nearly binomial, and so a multiplica- 
tive bound can be obtained by applying a known saddlepoint formula for 
lattice variables [see Daniels (1987)]. Equivalently, approximate by an exact 
binomial and (more directly, but with some rather tedious computation) 
expand the logarithm of the Gamma function in Stirling's formula. Using 
either approach, one can show the following result: 

Theorem 3. Let W ~ binomial(n,p), J he any interval of length 0{^/n) 
containing EW = np, and let w = 0{\Jn log(n)). Then 

(4.3) P{W e J + = P{Z G J + u;}(l + OiiiT^I'^ ^\og(n))), 
where Z ^ J\f{np,np{l — p)) . 

A proof based on multinomial expansions is given for the bivariate gen- 
eralization in Theorem 1. Note that this result includes an extra factor of 
Y^log(n). This will allow the bounds to hold except with probability bounded 
by an arbitrarily large negative power of n. This is clear for the limiting nor- 
mal case (by standard asymptotic expansions of the normal c.d.f.). To obtain 
such bounds for the distribution of Sn will require some form of Bernstein's 
inequality. Such inequalities date to Bernstein's original publication in 1924 
[see Bernstein (1964)], but a version due to Hoeffding (1963) may be easier 
to apply. 

Ingredient 3. Using Theorem 3, it can be shown (see Section 4) that 
the probability in (4.1) may be approximated as 

P{Sn€Ah}{l + 0{Ln/V^)), 

where the first coordinate of Sn is a sum of n i.i.d. A/'(0,r(l — r)) random 
variables, the last {p — 1) coordinates are those of Sn, and L„ = (log?i)^/^. 
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Since we seek a normal approximation for this probability with multiplica- 
tive error, at this point one might hope that a known (multidimensional) 
"Cramer" expansion or saddlepoint approximation would allow Sn to be 
replaced by a normal vector (thus providing the desired result). However, 
this will require that the summands be smooth, or (at least) satisfy a form 
of Cramer's condition. Let Xi denote the last {p — 1) coordinates of Xj. 
One approach would be to assume Xi has a smooth distribution satisfying 
the classical form of Cramer's condition. However, to maintain a conditional 
form of the analysis, it suffices to impose a condition on Xj, which is designed 
to mimic the effect of a smooth distribution and will hold with probability 
tending to one if Xi has such a smooth distribution. Condition XI specifies 
just such an assumption. 

Note that the characteristic functions of the summands of Sn, say, {(j)i{t)}, 
will also satisfy Condition XI [equation (3.1)] and so should allow applica- 
tion of known results on normal approximations. Unfortunately, I have been 
unable to find a published result providing this and so Section 5 will present 
an independent proof. 

Clearly, some additional conditions will be required. Specifically, we will 
need conditions that the empirical moments of {xj} converge appropriately, 
as specified in Condition X2. 

Finally, the approach using characteristic functions is greatly simplified 
when the sums, Sn, have densities. Again, to avoid using smoothness of the 
distribution of {xj} (and thus to maintain a conditional approach), introduce 
a random perturbation Vn which is small and has a bounded smooth density 
(the bound may depend on n). Section 4 will then prove the following: 

Theorem 4. Assume Conditions XI and X2 and the regression quantile 
model of Section 1. Let 5 he the argument of the density of n~^/'^{j3 — f5), 
and suppose 

\\5\\<d^/Ti 

for some constant d. Then a constant do can be chosen so that 

where Zn has mean —G~^5 and covariance r(l — T)Hn, do can be arbitrarily 
large, and Vn is a small perturbation [see (5.1)]. 

Following the proof of this theorem, it will be shown that the effect of Vn 
can be ignored, if Vn is bounded by n~'^^ , where di may depend on d (but 
not on do). 

Ingredient 4. Expanding the densities in (4.1) is trivial if the densities 
are sufficiently smooth. The assumption of a bounded first derivative in 
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Condition F appears to be required to analyze second order terms (beyond 
the first order normal approximation). 

Ingredient 5. Finally, summing terms involving det{Xfi) in (4.1) over 
the (") summands will require Vinograd's theorem and related results from 
matrix theory concerning adjoint matrices [see Gantmacher (I960)]. 

The remaining ingredients provide the desired "Hungarian" construction. 

Ingredient 6. Extend the density approximation to the joint density 
for /3(ri) and /3(t2) (when standardized). A major complication is that one 
needs a.„ = |r2 — ri| — )■ 0, making the covariance matrix tend to singularity. 
Thus, we focus on the joint density for standardized versions of /3(ri) and 
Dn = /3(r2) — /3(ti). Clearly, this requires modification of the proof for the 
univariate case to treat the fact that Dn converges at a rate depending on . 
The result is given in Theorem 1. 

Ingredient 7. Extend the density result to obtain an approximation 
for the quantile transform for the conditional distribution of differences Dn 
(between successive dyadic rationals). This will provide (independent) nor- 
mal approximations to the differences whose sums will have the same covari- 
ance structure as the regression quantile process (at least along a sufficiently 
sparse grid of dyadic rationals). 

Ingredient 8. Finally, the Hungarian construction is applied induc- 
tively along the sparse grid of dyadic rationals. This inductive step requires 
some innovative development, mainly because the regression quantile pro- 
cess is not directly expressible in terms of sums of random variables (as are 
the empiric one-sample distribution function and quantile function). 

5. Proof of Theorem 4. Let Sn be the last p — 1 coordinates of Sn and 
A^^\Sn,h) be the interval {a: {a,Sn) G Ah}- Then, 

P{Sn G Ah} = piY.(I{Y^ < x',p + - r) G A^^\Sn, h) 

= P|5^(/(li < m - r) G A(i)(^„,, h) 

10 

— ^ ^ /binomial (^) ''") 5 

k&A* 
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where A* is the set A^^^ shifted as indicated above. Note that by Hoeffding's 
inequahty [Hoeffding (1963)], for any fixed d, the shift satisfies 



< dy/ny/log{ 



n) 



except with probabihty bounded by 2n~^'^^ . Thus, we may apply Theorem 3 
[equation (4.3)] with w equal to the shift above to obtain the following bound 
(to within an additional additive error of 2n~^'^^): 

P{Sn G Ah} = P{nZ^T{l - t) e A^'\Sn, h)}{l + 0{an/M), 

where Z ~ A/'(0, 1) and a.„ is a bound on Sn, which may be taken to be of 
the form B^/logn (by Hoeffding's inequality). Finally, we obtain 



P{Sn G Ah} = P{Sn G Ah}{l + Oian/V^)) + 2 



n 



where the first coordinate of Sn is a sum of n i.i.d. A/'(0,r(l — r)) random 
variables and the last p — \ coordinates are those of Sn- 

To treat the probability involving Sn, standard approaches using charac- 
teristic functions can be employed. In theory, exponential tilting (or saddle- 
point methods) should provide better approximations, but since we require 
only the order of the leading error term, we can proceed more directly. As 
in Einmahl (1989), the first step is to add an independent perturbation so 
that the sum has an integrable density: specifically, for fixed /i G let T^i be 
a random variable (independent of all observations) with a smooth bounded 
density and for which (for each h£T-L) 

(5.1) \\Vn\\<n-''\ 

where di will be chosen later. Define 

We now allow Ah to be any (arbitrary) set, say, A. Thus, 5* has a density 
and we can write [with = (27r)~P] 

P{S*J^ GA} = c^j Vol(^)</.u„if(A) it)<Ps^ {t/V^)<Pv^ (t/V^) dt, 
where d>u denotes the characteristic function of the random variable U . 



Break domain of integration into 3 sets: \\t\\ < d2\/log{n), d2\/log{n) < 
\t\\ < e^/n, and l]t]]_>_e-v/n. 
On \\t\\ < dy^\og{n), expand log(j)g^^^{t). For this, compute 

fii = Exi{T - I{yi < x'i/S + x^5/^/n)) 

= -h{F-\T))x,x[5/^ + 0{\\xif\\6f/nl 

Si = Cov[2;i(r - I{yi < x'^/S + Xi6/^/n))] 
= Xix',T{l-T) + 0{\\xi\f\\6f/n). 
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Hence, using the boundedness of \\S\\ and ||t|| (on this first interval), 

'Ps^ (t/V^) = expj HjV^t'5 -^Yl *'^^*/^ + ^ (^^^^) I 

= exp|-iG„i'J - U'Hnt + 0{{\ognf/^/^) 

where Gn and Hn are defined in Condition X2 [see (3.2) and (3.3)]. 

For the other two intervals on the t-axis, the integrands will be bounded 
by an additive error times 



since \\Vn\\ < n~ ^ . 

On ||t|| < e-^/n, the summands are bounded and so their characteristic 
functions satisfy (t>i{s) < (1 — 6||i|p) for some constant c. Thus, on ci2-\/log(n) < 

\\t\\ ^ ^a/"') 



\<\>sS^I^)\ < (1 - hdi\o^(n)lnf~'^ < cin 



'2 



for some constant Ci. Therefore, integrating times 0v„(^/\/^) pi'ovides an 
additive bound of order n~'^* , where d* = bd"^ —p{di + 1/2) and (for any do) 
d2 can be chosen sufficiently large so that d* > do. 

Finally, on ||t|| > e^/n, Condition XI [see (3.1)] gives an additive bound 
of 77"" directly and, again (as on the previous interval), an additive error 
bounded by n~'^° can be obtained. 

Therefore, it now follows that we can choose do (depending on d, di, d2 
and d*) so that 

-Pjs'n + G ^1 = / '^°^(^)'^Unif(A)(i)</'Ar(-G5,r(l-^)^)(*)'^^n ("^) 

X (1 + 0{{\og\n)/nY'^)) + 0(n-'^°), 

from which Theorem 4 follows. 

Finally, we show that the contribution of Vn can be ignored: 

\P{Sn G Ah} - P{Sl G Ah}\ = \P{Sn G Ah} - P{Sn + K G A + K}| 

<P{Sn + Vn^Ahl\{Ah + Vn)}^ 

where A denotes the symmetric difference of the sets. Since Vn is bounded 
and Ah = XhR, this symmetric difference is contained in a set, D, which is 
the union of 2p (boundary) parallelepipeds each of the form XhRj, where Rj 
is a rectangle one of whose coordinates has width 2n~'^^ and all other coordi- 
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nates have length 1. Thus, applying Theorem 4 (as proved for the set A = D), 

\P{Sn G A} - P{S*n G All < P{Sn + Vn € D} 

<cVol(D) +C'(n-'^o) 

where c and c' are constants, and di may be chosen arbitrarily large. 

6. Normal approximation with nearly root-n multiplicative error. 

Theorem 5. Assume Conditions XI, X2, F and the regression quantile 
model of Section 1. Let 5 be the argument of the density of 5n = n~^/'^{ j3[T) — 
f3{T)) and suppose 

\\5\\<dy^I^) 

for some constant d. Then, uniformly ine<T<l— e (for £>Q), 

4^(<^) = ^s(5)(l + 0((log^(n)/n)i/2)), 

where ipY, denotes the normal density with covariance S„ = t{1 — t)G~^ HnG~^ 
with Gn and Hn given by (3.2) and (3.3). 

Proof. Recall the basic formula for the density (4.1): 

f-^{5) = n-Pl^ det(X,,)P{5„ G A} \{ fMP + n~^l^5). 

By Theorem 4, ignoring the multiplicative and additive error terms given in 
this result and setting f4 = (2'?r)~'P/^, 

P{Sn G Ah} = P{Zn G Ah/y/^} 

= c;|i/„rV2 I ^^Jh,_G-H)'^^{z-G^^'5)\dz 
Ja^/V^ I 2 T(l-r) J 

= c'jHn\-'/^exp{-ls'^-'6] I dz{l + 0{n-^'^)) 
I 2 J Ja,J^ 

= c;n-^'/Vhll^nr'/'exp|-i<5'S;i5|(l + 0(n-V2)) 

since z is bounded by a constant times n""*^/^ on A^j ^fn and the last integral 
equals Vol(^ft) = n-'Pl'^\Xh\. 
By Ingredient 4, the product is 

n/.(x:/3)(l + 0(||<^||n"V2)). 
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This gives the main term of the approximation as 

Y,n-nX,\']Jfi{x'MHn\~'/'eKpLl6'^-'6\. 
hen ieh ^ ^ 

The penultimate step is to apply results from matrix theory on adjoint 
matrices [specifically, the Cauchy-Binet theorem and the "trace" theorem; 
see, e.g., Gantmacher (1960), pages 9 and 87]: the sum above is just the 
trace of the pth adjoint of {X'DfX), which equals det{X' DfX). 

The various determinants combine (with the factor n ^) to give det(X]^) ' , 
which provides the asymptotic normal density we want. 

Finally, we need to combine the multiplicative and additive errors into 
a single multiplicative error. So consider \\5\\ < (iylog(n) (for some con- 
stant d). Then, the asymptotic normal density is bounded below by n"'^'^ for 
some constant c. 

Thus, since the constant do (which depends on di, d2, d* and rj) can be 
chosen so that the additive errors are smaller than 0{n~^'^~^^^), the error is 
entirely subsumed in the multiplicative factor: (1 + C'((log^(n)/?i)^/^)). □ 

7. The Hungarian construction. We first prove Theorem 1, which pro- 
vides the bivariate normal approximation. 

Proof of Theorem 1. The proof follows the development in Theo- 
rem 5. The first step treats the first (intercept) coordinate. Since the bi- 
nomial expansions were omitted in the proof of Theorem 3, details for the 
trinomial expansion needed for the bivariate case here will be presented. 

The binomial sum in the first coordinate of (4.2) will be split into the sum 
of observations in the intervals [x'j/3(0), x^/3(ti)), [x^/3(ti), x^/3(ri +a„)) and 
[x^(3{ti + a„), a;^/3(l)). The expected number of observations in each interval 
is within p oi n times the length of the corresponding interval. Thus, ignor- 
ing an error of order l/n, we expand a trinomial with n observations and 
pi = Ti and p2 = dn- Let (A^i, A^2) -^3) be the (trinomially distributed) num- 
ber of observation in the respective intervals and consider P* = P{Ni = ki, 
N2 = k2, N3 = n — ki — /C2}. We may take 

ki = 0{{nlogn)^/^), 

(7.1) 

A:2 = 0(a„(logn)i/2), 

since these bounds are exceeded with probability bounded by n~'^ for any 
(sufficiently large) d. So P* = A x B, where 

A = , 

{npi + kiy.{np2 + /c2)!(n(l - pi - P2) - ki - ^2)! ' 
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Expanding (using Sterling's formula and some computation), 
A = i-exp|2+(n+i) log(n+i 



( , f h + i\ 

[npi + A;i + - I log Inpi + I 
np2 + K2 + - log np2 + 



2 / V ^P2 
n(l -pi-p2)-ki-k2 + ^ 



X log (nil - p, - p2) - 4^^) + O (— ) 
V n{l-pi-p2)J \np2j 



2-K 



exp-| - logn — npi logpi — ( + tt ) log(npi) 



np2 logp2 - ^^2 + ^ \og{np2) 

n(l - pi - P2) log(l - pi - P2) - ^^1 + ^2 + ^ 

X log n 1 -pi -p2 ) 

npi np2 



{kl+k2? kl 



n(l-pi-p2) \inp 

exni — loprn — ( nni -I- fci 

27r ^ 

n(l - pi - P2) - A:i - ^2 + ^ ) log(l - pi - P2) 



expl - logn - (npi + ki + l;] logpi - ( np2 + ^2 + ^ 1 log; 



^ _ J_ _ (fci+fc2)^ ^ ^ 

npi np2 n{l-pi-p2) \ na?^ J 



B = exp{(npi + fci) logpi + {np2 + k2) \0gp2 

+ (n(l -pi- P2) - kl- k2) log(l -pi- P2)}. 
Therefore, 

fill, 
A X 5 = exp<^ --pi - -p2 - -(I-P1-P2) 



kl kl {ki+k2f I 

npi np2 n{l-pi-p2) \ na^ J] 
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Some further simplification shows that A x B gives the usual normal 
approximation to the trinomial with a multiplicative error of (1 + o(n^^/^)) 
[when ki and k2 satisfy (7.1)]. 

The next step of the proof follows that of Theorem 4 (see Ingredient 3). 
Since the proof is based on expanding characteristic functions (which do 
not involve the inverse of the covariance matrices), all uniform error bounds 
continue to hold. This extends the result of Theorem 4 to the bivariate case: 

P{Sn{Tl)eAh^,Sn{T2)eAh^} 

(7.2) =P{ZiGAhjV^,Z2£AhjV^} 

= P{Zi G AhJ^} X P{{Z2 - Zi)/y/^ G {Ah, - Z2)/V^\Zi} 

for appropriate normally distributed (^1,^2) (depending on n). This last 
equation is needed to extend the argument of Theorem 5, which involves 
integrating normal densities. The joint covariance matrix for {Sn{Ti),Sn{T2)) 
is nearly singular (for T2 — t\ small) and complicates the bounds for the 
integral of the densities. The first factor above can be treated exactly as 
in the proof of Theorem 5, while the conditional densities involved in the 
second factor can be handled by simple rescaling. This provides the desired 
generalization of Theorem 5. 

Thus, the next step is to develop the parameters of the normal distribution 
for {Bn{Ti),Rn) [see (3.4), (3.5)] in a usable form. The covariance matrix 
for {Bn{Ti), Bn{T2)) has blocks of the form 

'ti(1-ti)Aii ri(l-r2)Ai2 

Ti(l-T2)A21 T2(1-T2)A22 

where Ajj = G~^{Ti)HnG^^{Tj) with G.„ and Hn given in Condition X2 
[see (3.2) and (3.3)]. 

Expanding Gnir) about r = ri (using the differentiability of the densities 
from Condition F), 

Aij = All + iT2-Ti)Aij +0{\t2 -nl), 

where Ajj are derivatives of G„ at ri (note that An = 0). Straightforward 
matrix computation now yields the joint covariance for (i?„(ri), i?„): 

(7.3) cov(B4raiio = (7,('_7/4; (j-;;)^y+„,|.,-.,i), 

where A*^ are uniformly bounded matrices. 



Cov(B„(ti),B„(t2)) 



Thus, the conditional distribution of i?„ = \/ (t2 — Ti)(i?„(r2) — Bn{Ti)) 
given Bn{Ti) has moments 

(7.4) E[Rn\Bn{Ti)] = (t2 - n ) A^/ Ai2/(ri (1 - n)). 



(7.5) Cov[ii„|B„(ri)] = (r2-ri) 



A22 - J' ""^ A^iAr/A 



12 



and analogous equations also hold for {Z2 — Zi\Zi}. 
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Finally, recalling that T2 — T1 = an, the second term in (7.2) can be written 

[ Vn y/n J I \/n(r2 - n) ^ftia^i 

Thus, since the conditional covariance matrix is uniformly bounded except 
for the an = {t2 — ti) factor, the argument of Theorem 5 also applies directly 
to this conditional probability. □ 

Finally, the above results are used to apply the quantile transform for in- 
crements between dyadic rationals inductively in order to obtain the desired 
"Hungarian" construction. The proof of Theorem 2 is as follows: 

Proof of Theorem 2. (i) FoUowing the approach in Einmahl (1989), 
the first step is to provide the result of Theorem 1 for conditional densities 
one coordinate at a time. Using the notation of Theorem 1, let ti = 
and T2 = {k + 1)/2^ be successive dyadic rationals (between e and 1 — e) with 
denominator 2^. So a„ = 2~^. Let Rm be the mth coordinate of Rn{Ti,T2) 
[see (3.5)], let Rm be the vector of coordinates before the mth one, and let 
S = Bn{Ti). Then the conditional density of Rm\{Rni, S) satisfies 

(7-6) 4.|(H„.5)(nl-2,.) = ^,,E(ri|r2,.)(l + o(^i^^)) 

for ||ri|| < D\/logn, \\r2\\ < D\/logn, and ||s|| < D\/logn, and where /u and cr 
are easily derived from (7.4) and (7.5). Note that /i has the form 

(7.7) fi = y/a^a S, 

where ||a|| can be bounded (independent of n) and S can be bounded away 
from zero and infinity (independent of n). 

This follows since the conditional densities are ratios of marginal densi- 
ties of the form /y(y) = / fx,Ydx (with fxy satisfying Theorem 1). The 
integral over ||x|| < D^/\ogn has the multiplicative error bound directly. The 
remainder of the integral is bounded by n"*^, which is smaller than the nor- 
mal integral over ||3;|| < Dyjlog n (see the end of the proof of Theorem 5). 

(ii) The second step is to develop a bound on the (conditional) quantile 
transform in order to approximate an asymptotic normal random variable by 
a normal one. The basic idea appears in Einmahl (1989). Clearly, from (7.6), 

r r ( /(logn)3/2\\ 

4,„|(i?™,5)(^l^2,s)dn = y^ ^^^,(u\r2,s)du\\^Oy ^ j j 

for ||«|| < Dy/logn, \\r2\\ < D^/\ogn, and ||s|| < D^/logn. By Condition F, 
the conditional densities (of the response given x) are bounded above zero 
on e < r < 1 — e. Hence, the inverse of the above versions of the c.d.f.'s also 
satisfy this multiplicative error bound, at least for the variables bounded by 
D^/logn. Thus, the quantile transform can be applied to show that there is 
a normal random variable, Z* , such that (Rm — Z*) = 0{{logn)^^^ / ^/n) so 
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long as Rm and the quantile transform of Rm are bounded by D\/logn. Using 
the conditional mean and variance [see (7.7)], and the fact that the random 
variables exceed D^/log n with probability bounded by vT'^ (where d can 
be made large by choosing D large enough), there is a random variable Zm 
that can be chosen independently so that 

(7.8) Rm = anOl b + Zm + 0\ — --j= — I 

except with probability bounded by n~'^. 

(iii) Finally, the "Hungarian" construction will be developed inductively. 
Let T{k,€) = and consider induction on I. First consider the case where 
T > |; the argument for r < ^ is entirely analogous. 

Define e* = c(logn)^/^/-y/n, where c bounds the big-0 term in any equa- 
tion of the form (7.8). Let ^ be a bound [uniform over r G (e, 1 — e)] on a 
in (7.8). The induction hypothesis is as follows: there are normal random 
vectors Zn{k,i) such that 



(7.9) 



Bn{ ]-Zn{k,i) 



except with probability 2£n where for each £, Zn{-,i) has the same co- 
variance structure as i?„(-/2^), and where 

e 

(7.10) e{£)=£e:l[{l + A2-^/'). 

i=i 

Note: since the earlier bounds apply only for intervals whose lengths 
exceed (for some positive a), i must be taken to be smaller than 
alog2(n) = ©(logn). Thus, the bound in (7.10) becomes 0((logn)5/2/^), 
as stated in Theorem 1. 

To prove the induction result, note first that Theorem 1 (or Theorem 5) 
provides the normal approximation for i?n(^) for £=1. The induction step 
is proved as follows: following Einmahl (1989), take two consecutive dyadic 
rationals T{k,£) and T{k — 1,£) with k odd. So 

T{k-l,£) = [k/2]/2'-'=T{[k/2],£-l). 

Condition each coordinate of Bn{T{k,£)) on previous coordinates and on 
Bn{T{[k/2],£ — 1)). Let bn{T{k,£)) = bn{k/2^) be one such coordinate. 
Now, as above, define R{k,£) by 

bn{T{k, £)) = bn{T{[k/2],£ - 1)) + R{k, £). 

From (7.8), there is a normal random variable Zn{k,£) such that 

\R{k,£)-V2^a'Bn{T{[k/2],£-l))-Zn{k,£)\<e*^. 

By the induction hypothesis for {£—1), i?„(r([fc/2],£— 1) is approximable 
by normal random variables to within e{£ — 1) (except with probability n~'^). 
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Thus, a coordinate hn{T{[k/2\,i — 1) is also approximable with this error, 
and the error in approximating anCt Bn{T{[k/2],l — 1) is bounded by e{l — 1) 
times A./a^ = A2~^/'^. Finally, since is independent of these normal 

variables, the errors can be added to obtain 

(l + A2-^'^)e[l-l) + el. 

Therefore, except with probability less than 2{(. — l)n~'^ + 2n~'^ = 2in~'^, the 
induction hypothesis (7.9) holds with error 

- 1)4 nV + 2-^'/') X (1 + 2-'/') + < 
e 

<ell{i + 2~^/^)e: = e{e), 

i=i 

and the induction is proven. 

The theorem now follows since the piecewise linear interpolants satisfy 
the same error bound [see Neocleous and Portnoy (2008)]. □ 

APPENDIX 

Result 1. Under the conditions for the theorems here, the coverage 
probability for the confidence interval (2.3) is 1 — 2a + C'((logn)n^^/^), which 
is achieved at hn = C\/\og nn~^l'^ (where c is a constant). 

Sketch of proof. Recall the notation of Remark 2 in Section 2. Using 
Theorem 1 and the quantile transform as described in the first steps of 
Theorem 2 (and not needing the dyadic expansion argument), it can be 
shown that there is a bivariate normal pair (VF, Z) such that 

V^0{r) - /3(t)) = W + Rn, Rn = Op(n-i/2(logn)3/2), 

(A.l) 

V^(A(/i„) - A(/i„)) = Z + K, Rl = Op(?i-i/2(iogn)3/2). 

Note that from the proofs of Theorems 1 and 2, the Op terms above are 
actually O terms except with probability n""^ where d is an arbitrary fixed 
constant. The "almost sure" results above take d > 1, but d = \ will suffice 
for the bounds on the coverage probability here. 
Incorporating the approximation error in (A.l), 

V^(<5 -5) = Z/hn + R*n/hn + 0{n^/^hl). 

Now consider expanding Sa((5). First, note that under the design condi- 
tions here, Sa will be of exact order specifically, if X is replaced by 
i/nX, all terms involving X'X will remain bounded, and we may focus on 
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y/nSa{S). Note also that for /i„ = 0{n ^^'^), the terms in the expansion of 
((5 — 6) tend to zero [specifically, l/{y/nhn) = 0{n~^/^)]. So the sparsity, 
may be expanded in a Taylor series as follows: 

^sa{6) = ^/^Sa{5) + h\{5 -5) + h^{5-5)+ h{5 -5) + 0{n-^l^) 

where bi is a (gradient) vector that can be defined in terms of X and /3(t) 
(and its derivatives), 62 is a quadratic function (of its vector argument) 
and 63 is a cubic function. Note that under the design conditions, all the 
coefficients in bi, 62 and 63 are bounded, and so it is not hard to show that 
all the terms in K tend to zero as long as hn^/n — )■ 00. Specifically, if /i„ is of 
order n~^/^, then all the terms in K tend to zero. Also, i?* is within a logn 
factor of C(n~^/^) and /i^ is even smaller. Finally, Z is a difference of two 
quantiles separated by 2/i, and so has variance proportional to h. Thus, 
E{b'^Z / {^hn)f = 0{l/{nhn)). Thus, not only does b[Z/{^hn) -^^ 0, but 
powers of this term greater than 2 will also be Op{n~^). 

It follows that the coverage probability may be computed using only two 
terms of the Taylor series expansion for the normal c.d.f.: 

P{V^a'{f3{T) - /3(r)) < ZaV^Sa{6)} 

= P{a'{W + Rn) < Zo,V^Sa{5) + K} 
= E^a'W\zizaVnSa{S) + K - a Rn) 
= E{^a'w\z{y/nsa{5)) + (l3a'w\z{Vrisa{5)){K - a'Rn) 

+ ya'w\z{V^-'^a{mK - a'Rnf + 0((logn)Vn)} 
= l-a + Ti+T2 + 0[[\ognf/n). 

Note that the (normal) conditional distribution of W given Z is straightfor- 
ward to compute (using the usual asymptotic covariance matrix for quan- 
tiles): the conditional mean is a small constant (of the order of hn) times Z, 
and the conditional variance is bounded. 

Expanding the lower probability in the same way and subtracting provides 
some cancelation. The contribution of i?„ will cancel in the Ti differences, 
and is negligible in subsequent terms since i?^ = 0((logn)^/n). Similarly, the 
R^/ [y/nhn) term will appear only in the Ti difference where it contributes 
a term that is (logn)'^/^ times a term of order l/{nhn), and will also be negli- 
gible in subsequent terms. Also, the term will only appear in Ti, as higher 
powers will be negligible. The only remaining terms involve Z/{y/nhn))- For 
the first power (appearing in Ti), EZ = 0. For the squared Z-terms in T2, 
since Var(6'^Z) is proportional to hn, E{b'^Z)'^ /{nhf^) = ci/(n/i„), and all 
other terms involving Z have smaller order. 
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Therefore, one can obtain the following error for the coverage probability: 
for some constants ci and C2, the error is 

6'i R* ci 2 
-^^ + — + C2hl 

(plus terms of smaller order). Since i?* is of order nearly n~^/^, the first 
terms have nearly the same order. Using h'^^R^ = c(log n) /{y/nhn), it is straight- 
forward to find the optimal hn to be a constant times \/log nn~^/^, which 
bounds the error in the coverage probability by C'(lognn~^/'^). □ 
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